Reward Augmented Maximum Likelihood for Neural Structured Prediction
نویسندگان
چکیده
A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation. This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework. We establish a connection between the log-likelihood and regularized expected reward objectives, showing that at a zero temperature, they are approximately equivalent in the vicinity of the optimal solution. We show that optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated (temperature adjusted) rewards. Based on this observation, we optimize conditional log-probability of edited outputs that are sampled proportionally to their scaled exponentiated reward. We apply this framework to optimize edit distance in the output label space. Experiments on speech recognition and machine translation for neural sequence to sequence models show notable improvements over a maximum likelihood baseline by using edit distance augmented maximum likelihood.
منابع مشابه
Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML
Reward augmented maximum likelihood (RAML), a simple and effective learning framework to directly optimize towards the reward function in structured prediction tasks, has led to a number of impressive empirical successes. RAML incorporates task-specific reward by performing maximum-likelihood updates on candidate outputs sampled according to an exponentiated payoff distribution, which gives hig...
متن کاملNeural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
Harnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base. In this work, we introduce a Neural Symbolic Machine (NSM), which contains (a) a neural “programmer”, i.e., a sequence-to-sequence model that maps language utterances to programs and ut...
متن کاملMaximum Margin Reward Networks for Learning from Explicit and Implicit Supervision
Neural networks have achieved state-ofthe-art performance on several structuredoutput prediction tasks, trained in a fully supervised fashion. However, annotated examples in structured domains are often costly to obtain, which thus limits the applications of neural networks. In this work, we propose Maximum Margin Reward Networks, a neural networkbased framework that aims to learn from both exp...
متن کاملThe More the Merrier: Parameter Learning for Graphical Models with Multiple MAPs
Conditional random field (CRFs) is a popular and effective approach to structured prediction. When the underlying structure does not have a small tree-width, maximum likelihood estimation (MLE) is in general computationally hard. Discriminative methods such as Perceptron or Max-Margin Markov Networks circumvent this problem by requiring the MAP assignment only, which is often more tractable, ei...
متن کاملSEARNN: Training RNNs with Global-Local Losses
We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the “learning to search” (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an approp...
متن کامل